Incremental decision tree

Most decision tree methods take a complete data set and build a tree using that data. This tree cannot be changed if new data is acquired later.

Incremental decision trees are built using methods that allow an existing tree to be updated or revised using new, individual data instances. This is useful in several situations: a) the entire dataset is not available at the time the original tree is built, b) the original data set is too large to process, or c) the characteristics of the data change over time.

Contents

Applications

Methods

Here is a short list of incremental decision tree methods, organized by their (usually non-incremental) parent algorithms.

CART family

CART [1] (1984) is a nonincremental decision tree inducer for both classification and regression problems. developed in the mathematics and statistics communities. CART traces its roots to AID (1963)[2]

ID3/C4.5 family

ID3 (1986)[4] and C4.5 (1993)[5] were developed by Quinlan and have roots in Hunt's Concept Learning System (CLS, 1966)[6] The ID3 family of tree inducers was developed in the engineering and computer science communities.

note: ID6NB (2009)[12] is not incremental.

STAGGER

Schlimmer and Granger's STAGGER (1986) [13] was an early incremental learning system. It was developed to examine concepts that changed over time (concept drift).

VFDT

Very Fast Decision Trees learner reduces training time for large incremental data sets by subsampling the incoming data stream.

OLIN and IFN

See also

References

  1. ^ Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984) Classification and regression trees. Belmont, CA: Wadsworth International Group.
  2. ^ Morgan, J. N, & Sondquist, J. A. (1963) Problems in the analysis of survey data, and a proposal. J. Amer. Statist. Assoc., 58, 415-434.
  3. ^ Crawford, S. L. (1989) Extensions to the CART algorithm. International journal of man-machine studies. 31, 197-217.
  4. ^ Quinlan, J. R. (1986) Induction of Decision Trees. Machine Learning 1(1), 81-106.
  5. ^ Quinlan, J. R. (1993) C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.
  6. ^ Hunt, E. B., Marin, J., & Stone, P. J. (1966) Experiments in induction. New York: Academic Press.
  7. ^ a b Schlimmer, J. C., & Fisher, D. (1986) A case study of incremental concept induction. Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 496-501). Philadelphia, PA: Morgan Kaufmann.
  8. ^ Utgoff, P. (1988) ID5: An incremental ID3. Fifth International Conference on Machine Learning, pp. 107-120. Morgan Kaufmann Publishers.
  9. ^ Utgoff, P. E. (1989) Incremental induction of decision trees. Machine Learning 4, 161-186.
  10. ^ Kroon, M., Korzec, S., Adriani, P. (2007) ID6MDL: Post-Pruning Incremental Decision Trees.
  11. ^ Utgoff, P. E., Berkman, N. C., & Clouse, J. A. (1997) Decision tree induction based on efficient tree restructuring. Machine Learning 29, 5-44.
  12. ^ Appavu, S., & Rajaram, R. (2009) Knowledge-based system for text classification using ID6NB algorithm. Knowledge-based systems 22 1-7.
  13. ^ Schlimmer, J. C., &Granger, R. H., Jr. (1986). Incremental learning from noisy data. Machine Learning 1, 317-354.
  14. ^ Domingos, P., Hulten, G. (2000) Mining high-speed data streams. Proceedings KDD 2000, ACM Press, New York, NY, USA, pp. 71–80.
  15. ^ Hulten, G.,Spencer, L.,Domingos, P. (2001) Mining time-changing data streams. Proceedings KDD 2001, ACM Press, New York, NY, pp. 97–106.
  16. ^ Gama, J., Fernandes, R., & Rocha, R. (2006) Decision trees for mining data streams. Intelligent Data Analysis 10 23-45.
  17. ^ Last, M. (2002) Online classification of nonstationary data streams, Intell. Data Anal. 6(2) 129–147.
  18. ^ Cohen, L., Avrahami, G., Last, M., Kandel, A. (2008) Info-fuzzy algorithms for mining dynamic data streams. Applied soft computing. 8 1283-1294.
  19. ^ Maimon, O., Last, M. (2000) The info-fuzzy network (IFN) methodology. Knowledge Discovery and Data Mining. Boston: Kluwer Academic Publishers

External links